feat(read): implement read builtin with IFS field-splitting#211
feat(read): implement read builtin with IFS field-splitting#211
Conversation
Adds the `read` shell builtin, which consumes a delimited chunk from
stdin and assigns the IFS-split fields to one or more shell variables
(defaulting to REPLY when no name is given). This is the first builtin
in the codebase that mutates parent-shell state, so the change extends
the builtin contract with a minimal pair of capabilities.
Contract change (builtins.CallContext):
- SetVar(name, value) error: assign to a shell variable; routes
through the runner's setVarErr so MaxVarBytes and
MaxTotalVarsBytes are enforced.
- GetVar(name) (string, bool): read a shell variable (used to
consult IFS for field-splitting).
Wired at both CallContext construction sites in interp/runner_exec.go.
Builtin (builtins/read/read.go) supports:
-r raw mode (no backslash interpretation)
-p PROMPT print to stderr (only when stdin is a TTY, matches bash)
-d DELIM single-rune line delimiter; -d '' uses NUL
-n NCHARS read at most N characters (UTF-8 aware), capped at 1 MiB
-N NBYTES read exactly N bytes ignoring delimiters, capped at 1 MiB
-t SECONDS timeout via *os.File.SetReadDeadline (or context.WithTimeout
fallback); returns 142 on timeout to match bash
--help usage to stdout
Excluded from v1: -a (would need SetArrayVar), -s/-u/-e/-i (interactive
features that don't apply to a non-interactive shell). Documented in
SHELL_FEATURES.md.
Field-splitting follows POSIX read semantics: whitespace IFS chars
coalesce, non-whitespace IFS chars introduce separate fields, leading
and trailing IFS-whitespace is stripped, and the last variable absorbs
the remainder when there are more fields than names.
Bytes are read one at a time without buffering so subsequent reads from
the same stdin observe unconsumed bytes (a buffered reader would
swallow them past the delimiter).
Tests: 16 new scenarios under tests/scenarios/cmd/read/ covering
basic, flags, ifs, errors, and help. Three help scenarios bumped from
28 to 29 builtins; the obsolete read-is-blocked scenario removed.
Symbol allowlists updated for the new package's imports.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Builtins must not call time.Now directly; the canonical pattern is callCtx.Now (script-start reference, captured once per Run for stable multi-builtin reasoning) for things like find -mtime / ls -l. For read -t SECONDS we want a true monotonic "from now" deadline, not a script-start reference (a 5s timeout 10s into the script must not fire immediately). callCtx.Now is therefore wrong here. The right answer is to read the deadline already computed by context.WithTimeout — that call is on the symbol allowlist, the stdlib does the time.Now lookup internally, and ctx.Deadline() lets us project that same instant onto *os.File.SetReadDeadline so kernel and context deadlines stay in sync. Removes time.Now from both the global builtin allowlist and the read per-command sublist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex conduct a comprehensive security and code review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cbf514b698
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Four bash-compatibility fixes flagged by codex:
1. Preserve unsplit input for REPLY and -N
When no NAME is given, bash assigns the raw line to REPLY without
IFS trimming or splitting. With -N, bash assigns the whole read to
the first NAME and leaves remaining NAMEs empty (no IFS splitting).
Both paths now skip splitIFS entirely.
2. Count -N input as characters, not raw bytes
Bash's -N is character-based, not byte-based: read -N 1 over "éx"
stores "é" (2 bytes), not the first byte of the UTF-8 sequence.
Dropped the dedicated byte-mode branch and unified -N with the
default rune+escape path; -N differs only in that it ignores the
delimiter rune. Backslash handling in -N now matches bash too
(escapes processed unless -r).
3. Permit values exactly at the read byte cap
The cap check fired before the -n / delimiter checks, so a value of
exactly MaxReadBytes was rejected on the next iteration even though
the flag validation and MaxVarBytes both allow that size. The cap
now fires only when an append would push past it, so a 1 MiB
value/line followed by EOF or the delimiter returns successfully.
4. Handle a single trailing IFS separator
`printf 'a:b:\n' | { IFS=: read a b; }` should give b="b" because
the trailing colon represents an empty trailing field that bash
silently drops. splitIFS now strips a lone trailing non-ws IFS
character from the last absorbed field when it is the only such
character in that field. Verified across the spectrum: a:b: → b,
a:b:: → b::, a::b: → :b: (only one strip when there is exactly
one trailing IFS char in the absorbed remainder).
New scenarios:
basic/reply_preserves_padding.yaml
flags/N_counts_chars_utf8.yaml
flags/N_applies_escape.yaml
flags/N_skips_field_split.yaml
ifs/trailing_single_ifs_stripped.yaml
All 22 read scenarios pass; full suite green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bash comparison test runs in debian:bookworm-slim's default C/POSIX locale, where bash's `read -N` counts bytes (so `read -N 1` over `éx` returns the first byte 0xC3 — an invalid UTF-8 fragment). rshell always counts UTF-8 characters regardless of locale, so the test assertion `[é]` matched bash on macOS (UTF-8 locale) but not in CI. Setting LC_ALL=C.UTF-8 for the bash side puts bash in UTF-8 mode, making the comparison meaningful: both shells now treat -N as character-based on UTF-8 input. Verified locally with `LC_ALL=C bash` (returns `[\xc3]`) vs `LC_ALL=C.UTF-8 bash` (returns `[é]`). The intentional divergence in C-locale environments is documented in the scenario comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex conduct a comprehensive security and code review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 388464302b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Three bash-compatibility fixes from the latest codex review:
1. Preserve partial input on read timeouts
When `-t` fires after some data has been read but before the
delimiter, bash assigns the partial line and returns 142. Previous
implementation returned 142 immediately and discarded the bytes.
Restructured run() to track a `timedOut` flag and fall through to
the assignment path when there is data, surfacing 142 only at the
end. Empty result + timeout still returns 142 with no assignment.
2. Handle `-t 0` as a non-blocking poll
Bash's `read -t 0` returns 0 if input is immediately available
(data buffered or EOF), 142 otherwise — no waiting, no consumption.
Previously `context.WithTimeout(ctx, 0)` produced an already-expired
context that always reported timeout.
Added pollAvailable() that sets an in-the-past read deadline on
*os.File stdin (computed from c.Now to avoid a direct time.Now
call) and probes for one byte. The byte is consumed on success — an
intentional divergence from bash, documented inline; matching bash
exactly would require a platform-specific select(2) call.
3. Use a real isatty check for the -p prompt
`os.ModeCharDevice` matched any character device including
/dev/null, so `read -p PROMPT x </dev/null` would emit the prompt;
bash suppresses it. Replaced with golang.org/x/term.IsTerminal,
which uses the platform's TIOCGETA / GetConsoleMode and returns
false for /dev/null, pipes, and regular files.
Symbol allowlist updates:
+ golang.org/x/term.IsTerminal (global ceiling + read sublist)
+ time.Hour (read sublist; in-the-past poll deadline)
- os.ModeCharDevice (no longer needed; removed from global ceiling
and read sublist).
go.mod gains a direct dep on golang.org/x/term.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-13b845 # Conflicts: # SHELL_FEATURES.md # analysis/symbols_builtins.go # interp/register_builtins.go # tests/scenarios/cmd/help/restricted.yaml # tests/scenarios/cmd/help/restricted_all_flag.yaml # tests/scenarios/cmd/help/unrestricted.yaml # tests/scenarios/cmd/help/unrestricted_all_flag.yaml
|
@codex conduct a comprehensive security and code review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a82f8e2890
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Three bash-compatibility fixes from the latest codex review:
1. Defer invalid-name failure until assignment time
For `printf 'a b c\n' | { read good 1bad mid; }` bash assigns
good=a, prints an error for 1bad, leaves mid untouched, and
returns 1. Previously rshell rejected the whole call before the
read ran, so even good was left empty. The validation has been
moved into the assignment loop: each name is checked just before
its SetVar call, and the error message now matches bash's exact
format (`read: \`1bad': not a valid identifier`).
2. Honour the last of -n / -N when both are specified
bash applies last-set-wins: `read -n 1 -N 2` reads 2 chars,
`read -N 2 -n 1` reads 1. The previous implementation always
chose -N when present. pflag's `Changed()` only reports whether
a flag was set, not relative parse order, so we install a
custom `pflag.Value` (`orderedIntValue`) that records, on each
Set(), a monotonic counter shared between the two flags. The
handler picks the flag with the larger counter.
3. Don't consume input for `read -t 0` (Unix)
bash's `read -t 0` is a non-consuming availability probe, so
`read -t 0 && read line` over `printf "hello\n"` should leave
`line=hello`. The previous implementation set a deadline in the
past and read one byte, which leaked the byte on success. Added
a platform-specific `pollInputNonConsuming` helper using
`unix.Poll(fds, 0)` for Unix; Windows continues to fall back to
the consume-based probe (documented as a follow-up).
Symbol allowlist updates:
+ golang.org/x/sys/unix.{Poll,PollFd,POLLIN,POLLHUP}
+ strconv.Atoi / strconv.Itoa (used by orderedIntValue)
New scenarios:
errors/invalid_name_after_valid.yaml
flags/n_N_last_wins.yaml
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex conduct a comprehensive security and code review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 88b60b03dd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
When read hits EOF or times out before any data, bash assigns "" to
every requested NAME (clearing any prior value) and then returns the
non-zero exit code. The previous early-return on `line == ""` skipped
the assignment loop, so `X=old; read X </dev/null` left X=old instead
of clearing it.
Removed the early return; the assignment loop now runs with the
zero-valued `values` slice, which writes "" to each variable. The
existing exit-code logic at the end of run() then surfaces 1 (EOF) or
142 (timeout) as appropriate.
Verified bash matches:
X=old; read X </dev/null → rc=1, X=""
A=oldA B=oldB; read A B </dev/null → rc=1, A="", B=""
good=oldgood mid=oldmid;
read good 1bad mid </dev/null → rc=1, good="", mid=oldmid
(loop stops at 1bad)
{ sleep 0.2; printf "x\n"; } | { X=old;
read -t 0.05 X; } → rc=142, X=""
New scenario at tests/scenarios/cmd/read/basic/eof_clears_existing_var.yaml
locks in the multi-var EOF-clears-existing-value behavior.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex conduct a comprehensive security and code review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2c197eb924
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Bash's line-continuation behaviour for backslash-delim only applies when the delimiter is newline. With a custom -d delimiter, `\<delim>` is an escape — the delimiter character is appended literally: `printf 'a\,b,c' | read -d , x` should assign `a,b`. The previous code treated any `\<delim>` as continuation, dropping both bytes and silently losing data for any escaped custom-delim input. Tightened the line-continuation check to require `delim == '\n'` so non-newline delimiters fall through to the regular escape branch (backslash dropped, next rune appended). Verified against bash: printf 'a\,b,c' | read -d , x → x="a,b" printf 'a\,b,c' | read -d , -r x → x="a\\" printf 'a\\\nb\n' | read x → x="ab" (continuation, unchanged) printf 'a\\bc\n' | read x → x="abc" (regular escape, unchanged) New scenario at tests/scenarios/cmd/read/flags/escape_custom_delim.yaml. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex conduct a comprehensive security and code review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c9e80ef816
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c9e80ef816
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
💡 Codex ReviewLines 416 to 421 in c9e80ef When Lines 500 to 501 in c9e80ef For non-raw reads, an input made of repeated backslash-newline pairs never appends to ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Two fixes from the latest codex pass: 1. Honour run deadlines while read is blocked (P1) When read waited on idle stdin without its own -t flag, an inherited context deadline (interp.MaxExecutionTime, CLI --timeout, parent ctx cancellation) was only checked between bytes, not inside the blocking r.Read syscall. A blocking terminal/pipe read could hang an otherwise time-bounded run. Lifted the SetReadDeadline propagation out of the `if -t` block so it fires whenever readCtx has any deadline. context.WithTimeout already returns the earlier of -t and the inherited deadline via ctx.Deadline(), so a single SetReadDeadline call covers both cases. Plumbed this for *os.File stdin (the runner's typical pipe shape); other readers still rely on the per-byte ctx.Err() check. 2. Cap consumed bytes for continuations (P2) Non-raw input made of repeated backslash-newline pairs never appended to buf and never incremented runes, so the 1 MiB MaxReadBytes cap (which only checked output buffer size) never fired while read drained the stream looking for a logical terminator. With attacker-controlled stdin this could burn unbounded CPU/I/O. Added a `consumed` counter that increments on every successful readByte. The cap is now checked in readByte itself, so any sequence of input — including discarded continuations or escape bytes — is bounded at MaxReadBytes total. The existing tryAppend size check is retained for defence in depth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex conduct a comprehensive security and code review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d19f22da19
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Two fixes from the latest codex pass:
1. Allow max-sized records to reach EOF or delimiter (P2)
The new consumed-bytes cap fired with `>=` MaxReadBytes, which
meant a 1 MiB line followed by its newline errored out before
read could observe the terminator. The cap should fire only when
another byte would push us past it. Changed the check to `>` so
reading exactly MaxReadBytes data bytes plus a 1-byte delimiter
succeeds. Drain attacks still bounded — they error after at most
one byte over the cap.
2. Reject non-finite / oversized -t timeouts (P2)
`strconv.ParseFloat` accepts NaN, Inf, scientific notation, and
values whose nanosecond conversion overflows time.Duration into
a negative duration that fires immediately. Bash rejects all of
these as "invalid timeout specification".
Added parseReadTimeout that pre-validates the timeout string is
non-empty and contains only `[0-9.]` with at most one '.', then
ParseFloat for the numeric value, then NaN/Inf/overflow guards.
The error message now matches bash:
read: NaN: invalid timeout specification
Verified bash rejects: NaN, Inf, 1e6, 1e20, -5, notanumber
Verified bash accepts: 1, 1.5, 0.5, .5, 1., 100000000
Symbol allowlist: math.IsNaN, math.IsInf, math.MaxInt64 added to
the read sublist.
Updated tests/scenarios/cmd/read/errors/invalid_timeout.yaml to
exercise NaN, Inf, scientific notation, and negative inputs and
to assert the bash-style error message.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex conduct a comprehensive security and code review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9a240eb697
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Two fixes from the latest codex pass:
1. Skip NUL bytes unless NUL is the delimiter (P2)
Bash silently strips embedded NUL bytes from the assigned value
regardless of mode. The only case where NUL matters is `-d ''`
(NUL delimiter), and that path is handled by the delim check
above the new NUL skip. Verified bash produces "abc" for
`printf 'a\0b\0c\n' | read x` and "abcd" for `read -N 5 -d ''
<<< 'ab\0cd'` — neither retains the NUL byte.
Added an `if rn == 0 { continue }` after the delim check in
readInput. The NUL byte is still counted toward `consumed` (so
pathological NUL-stuffed input remains bounded) but is not
appended and does not increment `runes`.
2. Preserve abort semantics on variable storage exhaustion (P2)
AST-level assignment treats `errTotalVarStorageExceeded` from
the writeEnv as a script-aborting resource-cap violation
(interp/vars.go: sets `r.exit.exiting = true`). The previous
`read` SetVar wrapper called `r.setVarErr` and returned the
raw error to the builtin, which surfaced it as a regular
status-1 failure — letting the script continue past the cap
and weakening the DoS guard.
Added a public sentinel `builtins.ErrVarStorageExceeded`. The
SetVar wrapper now translates the private interp sentinel to
the public one (preserving the original error message via
wrapping). `read` checks `errors.Is(err, builtins.ErrVarStorageExceeded)`
and returns `Result{Code: 1, Exiting: true}` so the runner
stops the script, matching AST-assignment behaviour.
New scenario: tests/scenarios/cmd/read/basic/strips_nul_bytes.yaml
locks in the NUL-strip behaviour against bash.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review this PR (iteration 7). P0/P1/P2/P3 prefix. |
|
Iteration 7 self-review result: 0 findings. SUCCESS_COUNT=4/5. |
|
@codex review this PR (iteration 8). P0/P1/P2/P3 prefix. |
|
Iteration 8 self-review result: 0 findings. SUCCESS_COUNT=5/5 — loop reached clean terminator. |
Review-Fix Loop Summary
Iteration log
Findings fixed across the loopIteration 1 (self + codex):
Iteration 2 (codex):
Iteration 3 (codex):
Iterations 4–8: zero findings; codebase converged. Final state
Remaining issuesNone — PR is ready for human review and merge. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ac7919fbc0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…fields Codex P2: when the last assigned field's absorbed remainder contains internal IFS characters (ws or non-ws), bash preserves a trailing non-ws IFS character verbatim. The previous splitIFS rule "strip a lone trailing non-ws IFS char if exactly one such char in the absorbed string" was incorrect — `IFS=' ,' read a <<< 'b a,'` should produce a="b a," (the comma is preserved because the space is internal IFS-ws), but rshell stripped the comma yielding a="b a". Fix: tighten the strip rule. Strip the trailing non-ws IFS char only when the candidate string (s minus that char, then trim trailing IFS-ws) contains NO IFS characters at all — i.e., a single-field shape. Multiple internal IFS chars (any kind) means the absorbed value represents data from multiple fields and is preserved verbatim including the trailing delim. Verified empirically against bash 5.2: IFS=' ,' read a <<< 'b a,' → a="b a," (preserve) IFS=' ,' read a <<< 'b,a,' → a="b,a," (preserve) IFS=' ,' read a b <<< 'a b c,' → b="b c," (preserve) IFS=' ,' read a <<< 'a,' → a="a" (strip, single-field) IFS=' ,' read a <<< 'a ,' → a="a" (strip, single-field) IFS=':' read a <<< 'b a:' → a="b a" (strip, ' ' not IFS) Add ifs/preserve_trailing_delim_multifield.yaml covering the four distinct preserve/strip cases. The existing trailing_single_ifs_stripped and separator_plus_spaces scenarios continue to pass under the refined rule (their inputs are all single-field shape). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-13b845 # Conflicts: # analysis/symbols_builtins.go # builtins/builtins.go # interp/runner_exec.go
|
@codex conduct a comprehensive security and code review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6b131c9228
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex conduct a comprehensive security and code review |
💡 Codex ReviewLines 508 to 509 in 6b131c9 On non-Unix platforms, ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Codex P2: on non-Unix platforms, pollAvailable's fallback returned
Code 1 unconditionally, so Windows rshell always reported `read -t 0`
as "no data" for non-regular stdin even when buffered bytes were
sitting in a pipe. Scripts using `printf x | { read -t 0 && read v; }`
took the false branch on Windows while succeeding on bash/Linux.
Implement non-consuming readiness probing on Windows via a new
internal helper package:
- builtins/internal/winpoll/winpoll_windows.go uses
GetFileType(handle) to dispatch by descriptor type:
* FILE_TYPE_DISK → always available (regular files don't block).
* FILE_TYPE_PIPE → PeekNamedPipe via kernel32.dll reports the
buffered byte count without consuming any
data; lpTotalBytesAvail > 0 ⇒ available.
* FILE_TYPE_CHAR → GetNumberOfConsoleInputEvents reports
queued console-input event count (close
enough; counts non-character events too,
but matches bash-on-WSL heuristics).
* other → unsupported, caller falls back to Code 1.
- builtins/internal/winpoll/winpoll_other.go is a no-op stub for
non-Windows builds.
- builtins/read/poll_other.go now delegates to winpoll.PollNonConsuming
instead of returning unsupported unconditionally.
The PeekNamedPipe call uses unsafe.Pointer to pass &avail through
syscall.SyscallN — narrow exception isolated to builtins/internal/
(per the same convention as winnet for iphlpapi.dll). No pointer
arithmetic occurs; the returned uint32 is consumed directly.
Update internalPerPackageSymbols and internalAllowedSymbols to
allowlist the new Windows-specific symbols (GetFileType,
GetNumberOfConsoleInputEvents, FILE_TYPE_* constants, Handle) plus
the existing syscall.MustLoadDLL and unsafe.Pointer.
Verified: GOOS=windows go build ./... succeeds; analysis tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
[Claude Opus 4.7 (1M)] Addressed in 76cab76. Implemented non-consuming readiness probing on Windows via a new internal helper package
Cross-build verified: |
|
@codex conduct a comprehensive security and code review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 76cab766b8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex P2: on Unix, the poll path treats POLLHUP (peer hung up,
no buffered data) as available for `read -t 0` because bash 5.2
considers EOF as "I can complete the read without blocking" and
returns rc=0. The Windows winpoll implementation only reported
`avail > 0`, so `printf '' | { read -t 0 v; }` reported exit 1
on Windows while bash reports 0.
Per Microsoft's PeekNamedPipe docs, when the writer end of the
pipe is closed the call returns FALSE with GetLastError() ==
ERROR_BROKEN_PIPE. Use that as the EOF-ready signal: on a
broken-pipe failure return (available=true, supported=true);
any other failure returns (false, false) so the caller falls
back to the conservative "not available".
Update internalPerPackageSymbols["winpoll"] and
internalAllowedSymbols to allowlist syscall.Errno (for the type
assertion) and golang.org/x/sys/windows.ERROR_BROKEN_PIPE.
Verified: GOOS=windows go build ./... succeeds; analysis tests
and existing read tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex conduct a comprehensive security and code review |
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
|
||
| "github.com/DataDog/rshell/builtins" | ||
| ) | ||
|
|
There was a problem hiding this comment.
similar to other commands, can we add a description of the command + description per flag?
| // refuse to run with "variable access is not | ||
| // available", which is the closest analogue to bash's | ||
| // "exec read fails because read is a builtin, not an | ||
| // executable on PATH" behaviour. |
There was a problem hiding this comment.
why this comment is needed?
There was a problem hiding this comment.
The omission of SetVar/GetVar in this struct literal is load-bearing: it's how find -exec isolation works. read checks c.SetVar == nil and refuses to mutate the calling scope (matching bash, which exec-forks per match). The comment exists so a future maintainer doesn't "fix" the missing fields and silently break the isolation. Test: tests/scenarios/cmd/read/hardening/find_exec_isolation.yaml.
| vr := r.writeEnv.Get(name) | ||
| return vr.Str, vr.IsSet() | ||
| }, | ||
| Proc: r.proc, |
There was a problem hiding this comment.
why those changes to interp/runner_exec.go are needed?
There was a problem hiding this comment.
read is the first builtin that mutates parent-shell state, so the runner needs to expose the minimum surface for it:
- SetVar (assign vars in the calling scope — the whole point of read).
- GetVar (consult IFS for field-splitting).
- Translate the internal errTotalVarStorageExceeded sentinel to the public builtins.ErrVarStorageExceeded so the builtin can propagate Result.Exiting on resource-cap exhaustion (matches AST-level setVar semantics).
- Post-call ctx.Err() check so a read -t that hits the run-level deadline surfaces context.DeadlineExceeded to cmd/rshell main instead of the builtin's 142.
@AlexandreYang requested fuller per-command and per-flag descriptions to match the verbose-help builtins. - Replace the single-line package doc with a comprehensive comment block (Usage, Flags table, Field splitting, Backslash escapes, Exit codes, Unsupported flags) modeled after echo.go's package comment. - Enrich the `--help` output handler so `read --help` (and, via builtins/help, `help read`) produces a fuller help screen: brief description → Options (pflag-generated from the existing per-flag descriptions, unchanged) → Field-splitting paragraph → Backslash-escape paragraph → Exit-status table → Unsupported- flag note. No `Help` field is added on the Cmd struct: read registers its own --help handler, and TestHelpFieldOnlyOnNoFlagsCommands enforces that flag-having builtins surface help via --help so `help read` and `read --help` produce identical output. The per-flag descriptions on the pflag.StringP/BoolP/VarP calls were already present and remain unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
What does this PR do?
Adds the
readshell builtin to the restricted shell.readconsumes a delimited chunk from stdin and assigns each IFS-split field to a shell variable (defaulting toREPLYwhen no name is given).This is the first builtin in the codebase that mutates parent-shell state, so the change extends the builtin contract with two minimal capabilities on
builtins.CallContext:SetVar(name, value) error— routes through the runner's existingsetVarErr, soMaxVarBytes(1 MiB) andMaxTotalVarsBytes(1 MiB) are enforced for free.GetVar(name) (string, bool)— needed soreadcan consultIFSfor field-splitting.These two together form the minimal complete contract for any future state-aware builtin (
export,unset,shift, …).Motivation
readis a fundamental POSIX builtin. The canonical idiomwhile IFS= read -r line; do …; done < fileis widely used in shell scripts. Until now the shell had no way to express it;readwas explicitly blocked at dispatch time. Lifting that block required granting builtins a way to mutate parent-shell variables, which was a deliberate architectural decision discussed during implementation rather than a one-line capability add.Surface
Supported flags (six +
--help):-r-p PROMPT-d DELIM-d ''means NUL-n NCHARS-N NBYTES-t SECONDS*os.File.SetReadDeadline;context.WithTimeoutfallback. Returns exit 142 on timeout (bash convention: 128 + SIGALRM)Excluded from v1 with rationale in
SHELL_FEATURES.md:-a(needs a parallelSetArrayVarcontract addition; can follow later),-s/-u/-e/-i(interactive-terminal features that don't apply in a non-interactive shell).Memory & DoS hardening
-nand-Nare rejected at flag-parse time when N > 1 MiB or N < 0. The 1 MiB anchor matches the existingMaxLineBytescap incat/headand the runner'sMaxVarBytes/MaxTotalVarsBytes.readInputenforces a separate runtime byte ceiling (also 1 MiB) so a UTF-8-amplified-ncannot exceed the buffer cap.readcalls on the same stdin observe unconsumed bytes (abufio.Readerwould swallow bytes past the delimiter — common bug in naive implementations).MaxVarByteson the assigned value →MaxTotalVarsByteson the aggregate variable store.IFS field-splitting
Implements POSIX
readsemantics:Testing
16 new YAML scenarios under
tests/scenarios/cmd/read/:basic/(7) — single var, default REPLY, multi-var, last-absorbs, fewer-fields, redirect-from-file, EOF-returns-1.flags/(6) — raw vs non-raw, line continuation, delimiter,-n,-N.ifs/(3) — default IFS (space/tab), custom IFS (:), empty IFS (no stripping).errors/(4) —-nand-Ncap rejection, invalid timeout, invalid identifier.help/(1) —--helpexits 0 with usage.The non-bash-comparison scenarios all pass locally. Bash byte-for-byte comparison (
RSHELL_BASH_TEST=1) was not run locally — Docker daemon was not running. Worth verifying in CI.Three pre-existing
helpscenarios were updated for the bumped builtin count (28 → 29). The obsoleteshell/blocked_commands/.../read.yamlscenario was deleted.Symbol allowlists in
analysis/symbols_builtins.goupdated: 10 new symbols added to the global ceiling (context.CancelFunc,context.DeadlineExceeded,fmt.Fprint,os.ErrDeadlineExceeded,os.File,os.ModeCharDevice,time.Now,unicode/utf8.{DecodeLastRuneInString,FullRune,RuneSelf}) plus a new"read":per-command sublist.Reviewer notes
CallContextis the architecturally significant change. Future state-aware builtins should reuseSetVar/GetVarrather than reaching past them.\<delim>is a line continuation,\<X>is just X). ThereadInputfunction handles this with a single helper closure for rune reads.-pis silent in this codebase because stdin is always a pipe in scripted contexts. The TTY check is there for parity in case stdin is ever wired to a terminal by a calling embedder.verified/allowed_symbolslabel automatically — that is reserved for human approval after reviewing the symbol-allowlist changes.Checklist
SHELL_FEATURES.md)make fmtclean,go vetclean, full test suite passes locally🤖 Generated with Claude Code